On enhancing katz-smoothing based back-off language model

نویسندگان

  • Jian Wu
  • Fang Zheng
چکیده

Though the statistical language modeling plays an important role in speech recognition, there are still many problems that are difficult to be solved such as the sparseness of training data. Generally, two kinds of smoothing approaches, namely the back-off model and the interpolated model, have been proposed to solve the problem of the impreciseness of language models caused by the sparseness of training data. By expanding the idea of back-off to the re-estimation of not only the unseen word pairs but also all word pairs, a back-off model based modified method is proposed, referred to as the Enhanced Katz smoothing with deleted interpolation (EKSWDI). A uniform expression and two simplified versions for this modified model are also given. Experiments on a Chinese pinyin-to-character conversion system and the perplexity measure show that the proposed model has a better performance than the Katz smoothing method does.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Katz Smoothi Modeling in Speech

In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the ...

متن کامل

Improved katz smoothing for language modeling in speech recogniton

In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the ...

متن کامل

Back-off smoothing evaluation over syntactic language models

1 Continuous Speech Recognition systems require a Language Model (LM) to represent the syntactic constraints of the language. In LMs development a smoothing technique needs to be applied to also consider events not represented in the training corpus. In this work, several back-off smoothing approaches have been compared: classical discounting-distribution schema including Witten-Bell, Absolute ...

متن کامل

Morpheme Based Language Model for Tamil Speech Recognition System

This paper describes the design of a morpheme based language model for Tamil language. It aims to alleviate the main problems encountered in processing the Tamil language, like enormous vocabulary growth caused by large number of different forms derived for one word. The size of the vocabulary is reduced by decomposing the words into stems and endings and storing these sub word units (morphemes...

متن کامل

Less is More: Significance-Based N-gram Selection for Smaller, Better Language Models

The recent availability of large corpora for training N-gram language models has shown the utility of models of higher order than just trigrams. In this paper, we investigate methods to control the increase in model size resulting from applying standard methods at higher orders. We introduce significance-based N-gram selection, which not only reduces model size, but also improves perplexity for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000